5 research outputs found
Modeling Human Visual Search in Natural Scenes: A Combined Bayesian Searcher and Saliency Map Approach
Finding objects is essential for almost any daily-life visual task. Saliency models have been useful to predict fixation locations in natural images during a free-exploring task. However, it is still challenging to predict the sequence of fixations during visual search. Bayesian observer models are particularly suited for this task because they represent visual search as an active sampling process. Nevertheless, how they adapt to natural images remains largely unexplored. Here, we propose a unified Bayesian model for visual search guided by saliency maps as prior information. We validated our model with a visual search experiment in natural scenes. We showed that, although state-of-the-art saliency models performed well in predicting the first two fixations in a visual search task (90% of the performance achieved by humans), their performance degraded to chance afterward. Therefore, saliency maps alone could model bottom-up first impressions but they were not enough to explain scanpaths when top-down task information was critical. In contrast, our model led to human-like performance and scanpaths as revealed by: first, the agreement between targets found by the model and the humans on a trial-by-trial basis; and second, the scanpath similarity between the model and the humans, that makes the behavior of the model indistinguishable from that of humans. Altogether, the combination of deep neural networks based saliency models for image processing and a Bayesian framework for scanpath integration probes to be a powerful and flexible approach to model human behavior in natural scenarios.Fil: Buj铆a, Gast贸n Eli谩n. Consejo Nacional de Investigaciones Cient铆ficas y T茅cnicas. Oficina de Coordinaci贸n Administrativa Ciudad Universitaria. Instituto de Investigaci贸n en Ciencias de la Computaci贸n. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaci贸n en Ciencias de la Computaci贸n; Argentina. Consejo Nacional de Investigaciones Cient铆ficas y T茅cnicas. Oficina de Coordinaci贸n Administrativa Ciudad Universitaria. Instituto de Investigaci贸n en Ciencias de la Computaci贸n. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaci贸n en Ciencias de la Computaci贸n; ArgentinaFil: Sclar, Melanie. Consejo Nacional de Investigaciones Cient铆ficas y T茅cnicas. Oficina de Coordinaci贸n Administrativa Ciudad Universitaria. Instituto de Investigaci贸n en Ciencias de la Computaci贸n. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaci贸n en Ciencias de la Computaci贸n; ArgentinaFil: Vita, Sebasti谩n Alberto. Consejo Nacional de Investigaciones Cient铆ficas y T茅cnicas. Oficina de Coordinaci贸n Administrativa Ciudad Universitaria. Instituto de Investigaci贸n en Ciencias de la Computaci贸n. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaci贸n en Ciencias de la Computaci贸n; ArgentinaFil: Solovey, Guillermo. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Calculo. - Consejo Nacional de Investigaciones Cient铆ficas y T茅cnicas. Oficina de Coordinaci贸n Administrativa Ciudad Universitaria. Instituto de Calculo; ArgentinaFil: Kamienkowski, Juan Esteban. Consejo Nacional de Investigaciones Cient铆ficas y T茅cnicas. Oficina de Coordinaci贸n Administrativa Ciudad Universitaria. Instituto de Investigaci贸n en Ciencias de la Computaci贸n. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaci贸n en Ciencias de la Computaci贸n; Argentin
BotPercent: Estimating Bot Populations in Twitter Communities
Twitter bot detection is vital in combating misinformation and safeguarding
the integrity of social media discourse. While malicious bots are becoming more
and more sophisticated and personalized, standard bot detection approaches are
still agnostic to social environments (henceforth, communities) the bots
operate at. In this work, we introduce community-specific bot detection,
estimating the percentage of bots given the context of a community. Our method
-- BotPercent -- is an amalgamation of Twitter bot detection datasets and
feature-, text-, and graph-based models, adjusted to a particular community on
Twitter. We introduce an approach that performs confidence calibration across
bot detection models, which addresses generalization issues in existing
community-agnostic models targeting individual bots and leads to more accurate
community-level bot estimations. Experiments demonstrate that BotPercent
achieves state-of-the-art performance in community-level Twitter bot detection
across both balanced and imbalanced class distribution settings, %outperforming
existing approaches and presenting a less biased estimator of Twitter bot
populations within the communities we analyze. We then analyze bot rates in
several Twitter groups, including users who engage with partisan news media,
political communities in different countries, and more. Our results reveal that
the presence of Twitter bots is not homogeneous, but exhibiting a
spatial-temporal distribution with considerable heterogeneity that should be
taken into account for content moderation and social media policy making. The
implementation of BotPercent is available at
https://github.com/TamSiuhin/BotPercent.Comment: Accepted to findings of EMNLP 202
FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions
Theory of mind (ToM) evaluations currently focus on testing models using
passive narratives that inherently lack interactivity. We introduce FANToM, a
new benchmark designed to stress-test ToM within information-asymmetric
conversational contexts via question answering. Our benchmark draws upon
important theoretical requisites from psychology and necessary empirical
considerations when evaluating large language models (LLMs). In particular, we
formulate multiple types of questions that demand the same underlying reasoning
to identify illusory or false sense of ToM capabilities in LLMs. We show that
FANToM is challenging for state-of-the-art LLMs, which perform significantly
worse than humans even with chain-of-thought reasoning or fine-tuning.Comment: EMNLP 2023. Code and dataset can be found here:
https://hyunw.kim/fanto
Faith and Fate: Limits of Transformers on Compositionality
Transformer large language models (LLMs) have sparked admiration for their
exceptional performance on tasks that demand intricate multi-step reasoning.
Yet, these models simultaneously show failures on surprisingly trivial
problems. This begs the question: Are these errors incidental, or do they
signal more substantial limitations? In an attempt to demystify Transformers,
we investigate the limits of these models across three representative
compositional tasks -- multi-digit multiplication, logic grid puzzles, and a
classic dynamic programming problem. These tasks require breaking problems down
into sub-steps and synthesizing these steps into a precise answer. We formulate
compositional tasks as computation graphs to systematically quantify the level
of complexity, and break down reasoning steps into intermediate sub-procedures.
Our empirical findings suggest that Transformers solve compositional tasks by
reducing multi-step compositional reasoning into linearized subgraph matching,
without necessarily developing systematic problem-solving skills. To round off
our empirical study, we provide theoretical arguments on abstract multi-step
reasoning problems that highlight how Transformers' performance will rapidly
decay with increased task complexity.Comment: 10 pages + appendix (21 pages